feat(disk): ntfy alert when host disk fills up#92
Merged
Conversation
Adds a best-effort disk-usage guard that runs once per orchestrator tick (before any work, so it fires even when Claude is rate-limited). When the partition backing CLAYDE_DISK_ALERT_PATH (default /data, same volume as the host root) reaches the threshold (default 85%), it posts a warning to the existing ntfy topic. Repeat alerts are rate-limited by a cooldown (default 6h) persisted in /data/disk_alert_state.json so the 5-minute tick loop does not spam the same warning. Motivated by recurring full-disk incidents on clayde.net: Claude Code leaks ~26MB plugin-marketplace temp dirs per refresh (1100+ accumulated to 13GB). This surfaces the condition early instead of failing silently. Everything is configurable via CLAYDE_DISK_ALERT_* and best-effort — any error is logged, never raised. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
max-tet
requested changes
Jun 22, 2026
| headers = { | ||
| "Title": f"clayde.net disk {usage_pct}% full", | ||
| "Priority": "5", | ||
| "Tags": "floppy_disk,warning", |
Collaborator
There was a problem hiding this comment.
floppy_disk? Seriously? 😀
Owner
Author
There was a problem hiding this comment.
Gone 😄 — reusing send_ntfy means the tag is set there (rotating_light for warnings). No more floppy_disk.
| log.warning("could not persist disk alert state: %s", exc) | ||
|
|
||
|
|
||
| def _send(settings: Settings, *, usage_pct: int, free_gb: float) -> None: |
Collaborator
There was a problem hiding this comment.
I believe there should be a helper method for sending via ntfy somewhere. Use it if there is one.
Owner
Author
There was a problem hiding this comment.
Done — now delegates to webhook.notify.send_ntfy instead of the inline httpx POST. Added a test asserting the helper is called (success=False, right topic).
Addresses review: drop the ad-hoc httpx POST (and the floppy_disk tag) in disk._send and delegate to webhook.notify.send_ntfy. success=False gives warning styling (priority 5, rotating_light). Driven via asyncio.run from the sync tick loop — safe since main() never runs inside an active loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
clayde.net keeps hitting a full disk. Root cause this round: Claude Code leaks ~26 MB plugin-marketplace
temp_*dirs on every refresh — 1132 of them, 13 GB, had piled up underclayde-claude/plugins/marketplaces/. Cleaned manually (disk 91% → 56%), but the failure mode is silent: a full disk breaks clones, builds, and this agent loop with no warning.This PR makes Clayde warn before the disk fills, since Clayde already runs continuously on that host.
What
A best-effort disk guard (
src/clayde/disk.py) called once per orchestrator tick, before any work — so it fires even when Claude is rate-limited (disk fills regardless of usage limits).shutil.disk_usage(CLAYDE_DISK_ALERT_PATH)—/datais a bind-mount on the host root partition, so it reflects host disk fullness.CLAYDE_DISK_ALERT_THRESHOLD_PCT(default 85) → posts a warning to the existing ntfy topic.CLAYDE_DISK_ALERT_COOLDOWN_S(default 6h), persisted in/data/disk_alert_state.json, so the 5-min tick loop doesn't spam.New config keys (all optional, sane defaults):
CLAYDE_DISK_ALERT_ENABLED,_THRESHOLD_PCT,_PATH,_COOLDOWN_S.Out of scope
Does not auto-delete anything — alert only. The leak itself is a Claude Code bug worth filing upstream; auto-sweeping leaked temp dirs is a separate decision.
Test
tests/test_disk.py— threshold boundary, cooldown suppression, re-alert after cooldown, disabled, and usage-error swallowing. Full suite: 348 passed.Recommended reading order
src/clayde/config.py— 4 new settingssrc/clayde/disk.py— the guardsrc/clayde/orchestrator.py— hook inmain()tests/test_disk.pyCLAUDE.md— config table + module doc🤖 Generated with Claude Code